Designing q-Unique DNA Sequences with Integer Linear Programs and Euler Tours in De Bruijn Graphs
نویسندگان
چکیده
DNA nanoarchitechtures require carefully designed oligonucleotides with certain non-hybridization guarantees, which can be formalized as the q-uniqueness property on the sequence level. We study the optimization problem of finding a longest q-unique DNA sequence. We first present a convenient formulation as an integer linear program on the underlying De Bruijn graph that allows to flexibly incorporate a variety of constraints; solution times for practically relevant values of q are short. We then provide additional insights into the problem structure using the quotient graph of the De Bruijn graph with respect to the equivalence relation of reverse complementarity. Specifically, for odd q the quotient graph is Eulerian, and finding a longest q-unique sequence is equivalent to finding an Euler tour, hence solved in linear time (with respect to the output string length). For even q, selfcomplementary edges complicate the problem, and the graph has to be Eulerized by deleting a minimum number of edges. Two sub-cases arise, for one of which we present a complete solution, while the other one remains open.
منابع مشابه
The Number of Euler Tours of Random Directed Graphs
In this paper we obtain the expectation and variance of the number of Euler tours of a random Eulerian directed graph with fixed out-degree sequence. We use this to obtain the asymptotic distribution of the number of Euler tours of a random d-in/d-out graph and prove a concentration result. We are then able to show that a very simple approach for uniform sampling or approximately counting Euler...
متن کاملThe number of Euler tours of a random directed graph
In this paper we obtain the expectation and variance of the number of Euler tours of a random Eulerian directed graph with fixed out-degree sequence. We use this to obtain the asymptotic distribution of the number of Euler tours of a random d-in/d-out graph and prove a concentration result. We are then able to show that a very simple approach for uniform sampling or approximately counting Euler...
متن کاملReciprocals of Binary Power Series
If A is a set of nonnegative integers containing 0, then there is a unique nonempty set B of nonnegative integers such that every positive integer can be written in the form a + b, where a ∈ A and b ∈ B, in an even number of ways. We compute the natural density of B for several specific sets A, including the Prouhet-Thue-Morse sequence, {0} ∪ {2 : n ∈ N}, and random sets, and we also study the ...
متن کاملThe number of Euler tours of a random d-in/d-out graph
In this paper we obtain the expectation and variance of the number of Euler tours of a random d-in/d-out directed graph, for d ¥ 2. We use this to obtain the asymptotic distribution and prove a concentration result. We are then able to show that a very simple approach for uniform sampling or approximately counting Euler tours yields algorithms running in expected polynomial time for almost ever...
متن کاملReducing Genome Assembly Complexity with Optical Maps Final Report
The goal of genome assembly is to reconstruct contiguous portions of a genome (known as contigs) given short reads of DNA sequence obtained in a sequencing experiment. De Bruijn graphs are constructed by finding overlaps of length k − 2 between all substrings of length k − 1 from reads of at least k bases, resulting in a graph where the correct reconstruction of the genome is given by one of th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012